Automatically reducing machine learning model inputs

ABSTRACT

Various embodiments are generally directed to techniques to reduce inputs of a machine learning model (MLM) and increase path efficiency as a result. A method for reducing an MLM includes: receiving a machine learning (ML) dataset, partitioning the ML dataset into a first dataset, a second dataset, a third dataset, and a fourth dataset, training, validating, and testing the MLM using one or more of the first dataset, the second dataset, and the third dataset, after testing the MLM, automatically ranking an importance associated with each input of the MLM using the fourth dataset, and reducing a plurality of inputs of the MLM based on the automatic ranking.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. patent application Ser. No.16/535,789, entitled “AUTOMATICALLY REDUCING MACHINE LEARNING MODELINPUTS” filed on Aug. 8, 2019. The contents of the aforementionedapplication are incorporated herein by reference in their entirety.

BACKGROUND

The present invention relates to machine learning, and more particularlyto enhancing the efficiency of machine learning models.

As the utility and range of applications for machine learning modelsincreases, so does the size and complexity of the networks associatedwith the models. For example, models based on node networks, includingbut not limited to neural networks, have thousands, millions, and evenbillions of nodes (hidden and unhidden), and similarly, have vastnumbers of inputs that have various travel paths associated with thenetwork. Enhancing the efficiency of path trajectories of machinelearning models, including but not limited to models associated withvast node networks, is beneficial from both an efficiency and accuracyperspective.

SUMMARY

One aspect of the present disclosure includes an apparatus for reducingthe inputs of one or more machine learning models. The apparatusincludes: a memory to store instructions, and processing circuitry,coupled with the memory, operable to execute the instructions, that whenexecuted, cause the processing circuitry to: apply a reduced inputmachine learning model (MLM) to an applied dataset, the reduced MLMlearning model being derived from an original MLM, the reduced input MLMgenerated by the original MLM having been trained using a first dataset,including generating a plurality of weights for the original MLM,validated using a second dataset, including validating at least onehyperparameter of the original MLM, tested by a third dataset, includingevaluating i) an accuracy ability, ii) a precision ability, and iii) arecall ability of the original MLM, and pruned such that a plurality ofinputs of the original MLM are removed based on a ranking generated by afourth dataset, where each of the first dataset, the second dataset, thethird dataset, and the fourth dataset is distinct from one another.

Another aspect of the present disclosure includes a computer implementedmethod for reducing the inputs of one or more MLMs. The computerimplemented method includes: receiving a machine learning (ML) dataset,partitioning the ML dataset into a first dataset, a second dataset, athird dataset, and a fourth dataset, training, by a computer processor,a MLM based on the first dataset, where the training includes generatinga plurality of weights for the MLM after training the MLM, validating atleast one hyperparameter of the MLM based on the second dataset, aftervalidating the MLM, testing i) an accuracy ability, ii) a precisionability, and iii) a recall ability using the third dataset, aftertesting the MLM, automatically ranking an importance associated witheach input of the MLM using the fourth dataset, and reducing a pluralityof inputs of the MLM based on the automatic ranking, where each of thefirst dataset, the second dataset, the third dataset, and the fourthdataset are distinct from one another.

Yet another aspect of the present disclosure includes a non-transitorycomputer-readable storage medium storing computer-readable program codefor reducing one or inputs of one or more MLMs, the code executable by aprocessor to: train, by a computer processor, a MLM using a firstdataset, where the training includes generating a plurality of weightsfor the MLM, validating at least one hyperparameter of the MLM using asecond dataset, testing i) an accuracy ability, ii) a precision ability,and iii) a recall ability using a third dataset, automatically rank animportance associated with each input of the MLM using a fourth dataset,reduce a plurality of inputs of the MLM based on the automatic ranking,and after reducing the plurality of inputs of the MLM, retest thereduced input MLM using a fifth dataset, where the retesting includestesting i) an accuracy ability, ii) a precision ability, and iii) arecall ability of the reduced input MLM.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1A illustrates an example of a system for reducing inputs of amachine learning model (MLM) according to at least one embodiment of thepresent disclosure.

FIG. 1B illustrates an example of a system for reducing inputs of a MLMaccording to at least one embodiment of the present disclosure.

FIG. 2A illustrates one type of distribution ranking for a plurality ofMLM inputs according to at least one embodiment of the presentdisclosure.

FIG. 2B illustrates one type of distribution ranking for a plurality ofMLM inputs according to at least one embodiment of the presentdisclosure.

FIG. 2C illustrates one type of plot of a summation of an importancescore of MLM inputs in relation to the number of MLM inputs according toat least one embodiment of the present disclosure.

FIG. 3A/3B illustrate examples of one or more processing flows forreducing one or more inputs of an MLM according to at least oneembodiment of the present disclosure.

FIG. 4 illustrates at least one example of one or more processing flowsfor reducing one or more inputs of an MLM according to at least oneembodiment of the present disclosure.

FIG. 5 illustrates a machine learning system according to an embodimentof the present disclosure.

FIG. 6 illustrates an embodiment of a computing architecture useful withat least one embodiment of the present disclosure.

FIG. 7 illustrates an embodiment of a communications architecture usefulwith at least one embodiment of the present disclosure.

DETAILED DESCRIPTION

Various embodiments are generally directed to techniques, systems, andflows to improve the functionality and efficiency of machine learningmodels (MLM). One or more embodiments include a system and/or one ormore components associated with a system or systems that can reduce thenumber of inputs of an MLM. In one or more embodiments, the system forreducing inputs of an MLM includes a component that automaticallypartitions one or more data sources into one or more datasets fortraining an MLM, testing the trained MLM, and/or validating the trainedMLM. Once the MLM is fully trained (and, if applicable, validated andtested), one or more components of the system generate another datasetfrom the one or more data sources, which is then used by one or morecomponents of the system to reduce the number of inputs of the MLM. Oneor more components of the system can generate at least one distributionduring the reduction operation that ranks the importance of each inputof the MLM. The importance of each input is automatically determined byrunning a variance analysis by backpropagation, from the output to theinputs, using the fourth dataset, e.g., using the variant analysis andbackpropagation technique to sum value of each weight along with a pathassociated with each input. Based on this summation, one or morecomponents of the system can determine an importance value for eachinput of the MLM, and by extension, a basis for a distribution plot ofimportance in relation to input. The distribution ranking can then beused to reduce inputs with a smaller importance value in relation toother inputs, thus reducing the computing power in using the machinelearning model and/or the memory required to store it for subsequentuse. Also, by reducing inputs that are redundant and/or of little value,the MLM's accuracy on future out of sample datasets (e.g., samples notused in training, validation, testing, etc.) is also increased (byavoiding paths in the network or networks associated with the model,which adversely affect overall output).

The reduced input MLM can also be automatically retested using one ormore training, testing, validation, and/or input reduction operationsbased on more data sources. This can further enhance the efficiency andaccuracy of the model by further mitigating redundancies.

The system for reducing MLM inputs can look for gaps in the one or moreimportance distributions, such as the distance between a peak and a gapof a multiple peak, e.g. multimodal, Gaussian distribution(s)representing the importance of the inputs (based on a variant analysisemploying any suitable backpropagation technique). Based on the gaps,one or more components of the system can automatically reduce the numberof inputs and then plot all remaining inputs in relation to a normalizedimportance summation, and then reduce the number of features thatremained after the initial reduction.

One or more benefits of at least one embodiment of the presentdisclosure is making an MLM scalable for different applications andadjustable for different computing environments, without compromisingthe accuracy of the model. As inputs are reduced, an MLM can be used incomputing environments with less memory and computer processingresources, e.g. mobile phones, tablets, etc. Another benefit of at leastone embodiment of the present disclosure, in addition to making an MLMmore efficient from a computer resource perspective, is enhancing theaccuracy of the MLM by pruning redundant inputs associated with the MLM,e.g. inputs that have a very low importance value (from a weightedperspective in relation to their individual paths) are removed and onlyhigher value/importance inputs are employed.

Reference is now made to the drawings, wherein like reference numeralsare used to refer to like elements throughout. In the followingdescription, for the purpose of explanation, numerous specific detailsare set forth in order to provide a thorough understanding thereof. Itmay be evident, however, that the novel embodiments can be practicedwithout these specific details. In other instances, well-knownstructures and devices are shown in block diagram form to facilitate adescription thereof. The intention is to cover all modification,equivalents, and alternatives within the scope of the claims.

FIG. 1A illustrates an example of a machine learning efficiency system100 that can reduce one or more inputs of an MLM to increase theaccuracy and efficiency of the MLM. The “units” or “components”described in the system, whether contained in memory or otherwiseemployed therein, can be any suitable software, logic (hardware orsoftware), or hardware element specifically configured to perform or beused in the performance of one or more tasks or functions as discussedherein.

In one or more embodiments, the machine learning efficiency system 100can include a machine learning efficiency unit 103, which in turnincludes one or more processors 102, memory 104, storage 110 and anetwork interface 114. The one or more processors 102 can be anysuitable software or hardware computer components for carrying out anyoperation as discussed herein. The memory 104 can be any suitablecomponent or unit for storing protocols, information, algorithms, and/orinstructions for execution by the one or more processors, e.g., thememory 104 may be any volatile and/or non-volatile memory capable ofstoring information during and/or for execution of instructions. Thedevices, systems, sources, units and/or components of the machinelearning efficiency unit 103 can be coupled to a network 111, e.g., theInternet, via one or more wired and/or wireless network links, and canbe accessed by one or more network interfaces 114.

In one or more embodiments, the machine learning efficiency unit 103 caninteract with one or more users or clients 130 . . . 130N (andassociated user/client computing devices 131 . . . 131N, e.g., a laptop,mobile phone, tablet, or desktop computer) via a network interface 114that can access the network 111, and the machine learning efficiencyunit 103 can interact with one or more data databases or data sources120, such as one or more datasets, also via the network interfaceaccessing the network 111. In one or more embodiments the one or moredata sources can include any suitable database or dataset 121A (amachine learning or “ML” dataset) that can be used in one or moreoperations involved with training and improving a machine learningmodel. The database or dataset (referred to interchangeable herein) 121Acan be portioned into one or more other datasets that constitute apercentage of the data of the overall dataset 121A. For example, thepartition can include a training set 121B for training an MLM, avalidating set 121C for validating an MLM, testing set 121D for testingan MLM, and a machine learning (ML) or applied dataset 122 that can beused to reduce the inputs of the MLM.

In one or more embodiments, an operation or operations associated with atraining dataset 121B can involve training the initial MLM, an operationor operations associated with a validating set 121C can involve tuninghyperparameters of an MLM (and assessing the performance of an MLM inlight of the hyperparameter tuning), and an operation or operationsassociated with testing dataset 121D can be to test the finally trainedand tuned MLM. In one or more embodiments, each database or dataset121B, 121C, and 121D contain no overlapping data, even if the sets 121B,121C, and 121D stem from the same source 121A.

In one or more embodiments, the applied dataset 122 can be a subset ofdataset 121A or completely distinct therefrom. In one embodiment, theapplied dataset 122 can have overlapping data with one or more datasets121B, 121C, and 121D, and in another embodiment, there is no overlapbetween the data contained in dataset 122 and sets 121B, 121C, and 121D.

In one or more embodiments, the memory 104 can include a machinelearning input reduction unit 105 and an operating system 109, where theoperating system 109 can be any suitable operating system compatiblewith system 100. In one or more embodiments, the machine learning inputreduction unit 105 can further include a data processing component 106and an input reduction component 108.

In one or more embodiments, the one or more components of the machinelearning efficiency unit 103 perform one or more operations to train,validate, and test an MLM and then reduce the inputs associatedtherewith. In one or more embodiments, one or more components of themachine learning efficiency unit 103 can receive a fully or partiallytrained model and perform one or more operations to develop further themodel, including reducing the inputs associated therewith. In one ormore embodiments, one or more components of the machine learningefficiency unit can automatically partition one or more datasets, e.g.121A, into multiple datasets, e.g. 121B-121D, and use the datasets totrain a model from the beginning, and then perform one or moreoperations to reduce the inputs associated therewith. The operation totrain, validate, test, and/or reduce the inputs of the MLM can beinitiated from the one or more users 130 . . . 130N via one or more usercomputing devices 131 . . . 131N, or the operations can otherwise bemanually or automatically initiated by any suitable entity.

In one or more embodiments, the machine learning efficiency unit 103 isconfigured to automatically, upon initiation of a user 130 . . . 130N orany other suitable trigger, partition machine learning model dataset121A into training set 120B, validation set 121C, and testing set 121D.Thereafter, it will automatically train an MLM 112A using dataset 121B,validate the machine learning model 112A using training set 121C, andtest the machine learning model 112A using testing set 121D. In one ormore embodiments, the fully trained MLM 112A can be stored in anysuitable storage component of the system 100, such as storage 110. Oncethe MLM 112A is trained, the machine learning unit 103 is configured toreduce or prune the inputs of the trained MLM 112A using applied dataset122 by applying a variant analysis and backpropagation technique to theMLM 112A, thus generating a reduced (or pruned) input MLM 112B, whereMLM 112B can also be stored in storage 110.

Embodiments are not limited in the above manner, and the above system ismerely an exemplary embodiment for implementing one or more features ofthe present disclosure.

In one or more embodiments, one or more users 130 . . . 130N (or anyother suitable entity can make the request) can initiate a request tothe machine learning efficiency unit 103 to train an MLM via network111. Alternatively, an already trained MLM can be provided foradditional operations to the machine learning efficiency unit 103, e.g.,a model stored on one or more user devices 131 . . . 131N.

In one or more embodiments, the data processing component 106 willautomatically partition a dataset 121A into one or more datasets 121B,121C, and 121D for the purposes as outlined above and/or it will obtaina dataset, e.g., applied dataset 122, that can be used to reduce orprune the inputs of an MLM.

In one or more embodiments, the input reduction component 108 willperform one or more operations to reduce or prune the inputs of the MLM.In one or more embodiments, the MLM can be MLM 112A, where the one ormore users 130 . . . 130N provide a trained MLM 112A, or alternatively,the input reduction component 108 will coordinate with base applicationcomponent 107 to receive a base application or protocol for forming anMLM, e.g., the instructions and purpose associated with a to be trainedMLM 112A, and the input reduction component will perform one or moreoperations to train an MLM 112A using the one or more data sources 120,where in one or more embodiments the trained MLM 112A can be stored instorage 110 (fully trained or partially trained),

In one or more embodiments, the input reduction component 108 can applya backpropagation technique at the output of MLM 112A using applieddataset 122 in order to assess the importance of each input of the MLM112A. The backpropagation may involve a variant analysis, e.g. anysuitable statistical method that assesses the relative importance of theinputs in relation to one another, including an assessment of theweighted values of nodes (or neurons in the case of a neural network)that follow the path of each input, although by starting at the outputand tracing backward. After performing the backpropagation, and based onthe results therefrom, the input reduction component 108 may reduce orprune one or more inputs of the MLM 112A, thus forming a reduced (orpruned) input MLM 112B. In one or more embodiments, the reduced (orpruned) input MLM 112B can be stored in storage 110.

In one or more embodiments, the input reduction component 108 maygenerate one or more plot distributions of the importance associatedwith each input of the MLM, e.g., a Gaussian distribution with one ormore maximum and minimum peaks, e.g. a multimodal Gaussian distribution.The distribution(s) may plot the importance of each input of MLM 112Aderived from the backpropagation operation, and based on thedistributions, the input reduction component 108 may reduce or prune theinputs of MLM 112A to create reduced (or pruned) input MLM 112B.

In one or more embodiments, the reduced input MLM 112 offers one or moreadvantages as a result of having certain inputs removed, includingimproving the efficiency of the MLM by minimizing redundancy and errorsinherent with certain paths or networks associated with particularinputs of MLM 112A, and by reducing the computer resources required toprocess and store the reduced MLM 112B, it is more conducive for storagein smaller devices such as mobile phones and tablets.

FIG. 1B illustrates a configuration 101 for using one or more componentsof machine learning efficiency system 100 to apply a variant analysisand backpropagation technique to one or more MLMs, e.g., MLM 112A. Inone or more embodiments, the machine learning efficiency unit 103 mayinteract with data sources 120 and MLM 112A as outlined above. Themachine learning efficiency unit 103 can fully develop MLM 112A usingdatasets 121A, 121B, 121C, and 121D. Alternatively, a fully trained MLM112A can be provided by one or more users or by any other suitablemethod. In one or more embodiments, the machine learning efficiency unit103 may apply a backpropagation operation 220 to perform a variantanalysis of each input of the fully trained MLM 112A. Although fourinputs are shown in FIG. 1B for purposes of simplicity, in one or moreembodiments, hundreds, thousands, millions, or more inputs are possible.In one or more embodiments, dataset 122 can be used by the machinelearning efficiency unit 103 to perform the variant analysis andbackpropagation technique. The backpropagation 220 operation candetermine the importance of each input performing any suitable machinelearning or statistical analysis technique starting from the output A tothe various inputs, e.g. determine the path importance associated witheach input by assessing the accuracy of running data through certainpaths and input and not others, and determining the accuracy ordifference in accuracy when omitting and/or using one input in relationto another. Accordingly, in one or more embodiments, the variance orrelative importance of each input is determined by running a separatedataset from the output A through the inputs.

In the example provided in FIG. 1B, as stated, four inputs are shown:Input 1, Input 2, Input 3, and Input 4. The input weights represent theimportance of each input and are normalized to add up to a particularnumber, where the summed number represents running every path and everyinput. For example, in FIG. 1B, the normalized number is “1” and theweights of each input, e.g., “0.1” for Input 1, “0.4” for Input 2, “0.3”for Input 3, and “0.2” for Input 4. In one or more embodiments, once themachine learning efficiency unit 103 determines the importance of eachinput by backpropagation 210, it can plot the variant analysis resultsas one or more distributions or graphs, as shown in FIG. 2A, FIG. 2B,and FIG. 2C. Based on the plot results, the machine learning efficiencyunit 103 can remove the least important input features. It is notedhere, and discussed below, that FIGS. 2A-2C are directed to embodimentswhere there are a significant number of inputs or features (N), but thesame analysis and technique can apply to FIG. 1B. As shown, after thebackpropagation and variant analysis is performed by the machinelearning efficiency unit 103, only Input 3 and Input 1, the featureswith the highest importance, remain, and all other features, i.e., Input2 and Input 4 are removed, thus producing MLM 112B. In one or moreembodiments, more than one graphical operation and/or plot (discussedbelow) is performed/used to ultimately determine the number of inputs tobe reduced (or pruned) for the reduced input MLM.

FIG. 2A illustrates one type of graphical result 200A that can beobtained from one or more components of the present disclosureperforming a backpropagation and variant analysis on an MLM. Thegraphical result 200A is a simple Gaussian distribution that has asingle peak “Peak A” associated therewith. Importance (I) is plottedwith respect to every feature or input of MLM, e.g., labeled“Features(N),” in 200A. PeakA represents the input with the highestimportance, with all inputs thereafter having a declining value. In thisembodiment, a threshold (not shown) can be set after Peak A, and anyinputs after the threshold are automatically excluded from the reducedinput MLM.

FIG. 2B illustrates one type of graphical result 200B that can beobtained from one or more components of the present disclosureperforming a backpropagation and variant analysis on an MLM. Thegraphical results 200B represents a more complicated distribution withmultiple peaks (e.g. multimodal), e.g., Peak 1, Peak 2, and Peak 3, andmultiple gaps, e.g., Gap 1, Gap 2, and Gap 3 in between the peaks. Aswith FIG. 2A, Importance (I) is plotted with respect to every feature orinput of MLM, e.g., labeled “Features(N).” In one or more embodiments,the machine learning efficiency unit 103 can be configured to eliminatea pre-determined number of inputs between gaps and peaks and/or performa more sophisticated analysis that eliminate inputs based on aparticular mathematical computation, e.g., a derivative computation, inrelation to one part of the curve 200B and another part of the curve200B, e.g., a determined descending derivative value through the gapsand up until an ascending derivative value is reached. Once this initialreduction takes place, the machine learning efficiency unit can generatea second curve 200C as shown in FIG. 2C. The curve plots the normalizedImportance Sum Σ(I) in relation to the Sum of Features Σ(N), i.e., thesum of the importance of the inputs that were kept from the graphicalresult 200B. In other words, the curve represents an ascending sum ofthe kept inputs from result 200B, the weight of the first feature orinput added to the next feature or input and so on, from most importantfeature or input A, all the way through A added with all inputs up andincluding “N−n.” The maximum value of the Importance Sum Σ(I) is anormalized value that would include all inputs if included, e.g., “1.”The feature “N−n” represents the input where a dramatic reduction in theincrease of the sum occurs as the weighted value of a feature or inputsis added to the preceding value. That occurrence (dramatic reduction)and associated feature or input can be determined using any suitablemathematical operation for determining the slowdown in the value of acurve growth. Once that value and associated feature or input N−n isdetermined, all subsequent features or inputs are omitted from the finalreduced input MLM.

Although with respect to FIG. 2B and FIG. 2C two graphical operationsare performed to obtained the reduced input MLM, either of the two typesof graphical operations can be performed multiple times to furtherreduce inputs as needed for a particular application, e.g. if the MLM isintended for a device with limited storage, this operation could becarried out multiple times until the size of the reduced input MLMaligns with the memory capacity of that device. Similarly, although theabove discussion is limited to training, validating, and testing an MLMusing three distinct datasets, e.g., 121B, 121C, and 121D and then afourth dataset to reduce the inputs of the MLM, after a reduced MLM,e.g., 112B, is created, it can be subsequently tested with differentdatasets, e.g. a fourth dataset for training the reduced input MLM, afifth dataset for validating the reduced input MLM, and a sixth datasetfor testing the reduced input MLM, where the fourth, fifth, and sixthdatasets are distinct from datasets 121B, 121C, 121D, and 122, does notshare any overlapping data in relation thereto. Once the reduced MLM112B is fully retrained, another dataset, e.g., a seventh dataset, canbe used to reduce the inputs of the reduced input MLM using anotheriteration of backpropagation and variant analysis by one or morecomponents as described herein, where the seventh dataset is distinctand does not share any overlapping data with respect to 121B, 121C,121D, 122, and any other datasets used on the reduced input MLM, e.g.,the fourth, fifth, and sixth dataset.

FIG. 3A illustrates an example of a logic flow 300A that may berepresentative of some or all of the operations executed by one or moreembodiments described herein. For example, the logic flow 300A mayillustrate operations performed by a machine learning efficiency systemto reduce the inputs of one or more inputs of an MLM.

At block 305, one or more embodiments may include receiving a genericmachine learning dataset or ML dataset. The machine learning dataset canbe provided by one or more users over a network and to a device orcomponent as described in one or more embodiments herein, or it may bepre-stored in a system or component as described herein.

At block 310, one or more embodiments involve automaticallypartitioning, using any suitable component as described herein orotherwise consistent with the teachings as outlined herein, the MLdataset into one or more other datasets for fully developing an MLM,including a first dataset, a second dataset, a third dataset, and afourth dataset. In one or more embodiments, the first dataset can be fortraining the MLM, the second dataset can be for validating the MLM, thethird dataset can be for testing the MLM, and the fourth dataset can befor reducing one or more inputs of the MLM. In one or more embodiments,the first, second, third, and fourth datasets do not share any data incommon.

At block 315, one or more embodiments involve automatically training theMLM using the first dataset. In one or more embodiments, the first(training) dataset may contain the largest amount of data in relation tothe second dataset, the third dataset, and the fourth dataset, and oneor more components of any suitable system as described herein mayprovide a learning algorithm, the type of which is contingent on thetype of MLM ultimately preferable to a particular application oruse-case, and use that learning algorithm to train the MLM.

At block 320, one or more embodiments involve automatically validatingthe MLM using the second dataset. In one or more embodiments, the second(validation) dataset may be the second largest dataset with respect tothe first dataset, the second dataset, the third dataset, and the fourthdataset. In one or more embodiments, the trained MLM from block 315 willbe validated using the second dataset, e.g., one or more hyperparametersor any relevant external factor that can affect the MLM in real-timeusage will be tuned or optimized using the second dataset. In one ormore embodiments, factors that can be addressed in validation includebut are not limited to the optimal size of input data, optimalprediction outputs, and the scope and/or a classifiable differencebetween inputs.

At block 325, one or more embodiments involve automatically testing theMLM using the third dataset. In one or more embodiments, the third(testing) dataset may be the third largest dataset with respect to thefirst dataset, the second dataset, the third dataset, and the fourthdataset. In one or more embodiments, the validated MLM from block 320may be tested to determine any suitable error rate, where the error ratecan be adjusted by reperforming the operations of blocks 310-320. In oneor more embodiments, the MLM will be tested with respect to one or moreof an accuracy ability or feature, a precision ability or feature,and/or a recall ability or feature.

At block 330, one or more embodiments involve ranking importance of oneor more inputs of the MLM using the fourth dataset. In one or moreembodiments, the fourth (applied or ML) dataset may be the smallestdataset with respect to the first, second, third, and fourth datasets.The ranking can be any suitable ranking technique that considers theweight or importance of each input in relation to a path trajectoryassociated with that input and the MLM, and/or any suitable rankingtechnique that considers the accuracy or error-rate associated with aparticular input when processed using the fourth dataset.

At block 335, one or more embodiments involve reducing the input of theMLM, after it has been fully developed (trained, validated, and tested)and the rankings of the inputs are ranked in accordance with block 330.The reduction can take any approach to reduce the inputs based on therankings, as described herein, or as otherwise suitable, including butnot limited to reducing inputs based on a graphical distribution.

FIG. 3B illustrates an example of a logic flow 300B that may berepresentative of some or all of the operations executed by one or moreembodiments described herein. For example, the logic flow 300B mayillustrate operations performed by a machine learning efficiency systemto reduce the inputs of one or more inputs of an MLM. In one or moreoperations, the flow performs begins at block 325 of flow 300A, and fromblock 325, proceeds directly to block 340.

At block 340, one or more embodiments involve reducing a plurality ofinputs of an MLM using a ranking based on a variant analysis andbackpropagation technique MLM. The variant analysis can apply the fourthdataset at the output of the trained, validated, and tested MLM tobackpropagate a determination as to the importance of each input of theMLM. One or more graphical distributions of the importance value of eachinput of the MLM can be determined based on the backpropagationtechnique. In one or more embodiments, based on at least one graphicaldistribution, e.g., a topography of a Gaussian distribution, inputs ofthe MLM are removed, and the reduced MLM can be used in one or moreapplications as intended. In another one or more embodiments, thereduced inputs of the reduced input MLM will be plotted a second time inrelation to importance, and based on the topography of the second graph,additional inputs of the reduced input MLM will be reduced.

At block 345, one or more embodiments involve training or retraining thereduced input MLM using a fifth dataset. The retraining can be inaccordance with the techniques described herein, including the operationor operations associated with block 315, provided that, in one or moreembodiments, the fifth dataset does not share any data in common withthe first, second, third, and fourth datasets (and, as discussed below,with respect to the sixth, seventh, and eighth datasets). In one or moreembodiments, the fifth dataset can be derived from the ML dataset of305, or it can be derived from a completely new source.

At block 350, one or more embodiments involve validating or revalidatingthe reduced input MLM using a sixth dataset. The revalidating can be inaccordance with the techniques described herein, including the operationor operations associated with block 320, provided that, in one or moreembodiments, the sixth dataset does not share any data in common withthe first, second, third, fourth, and fifth datasets (and as discussedbelow, with respect to the seventh and eighth datasets). In one or moreembodiments, the sixth dataset can be derived from the ML dataset of305, or it can be derived from a completely new source.

At block 355, one or more embodiments involve testing or retesting thereduced input MLM using a seventh dataset. The retesting can be inaccordance with the techniques described herein, including the operationor operations associated with block 325, provided that, in one or moreembodiments, the seventh dataset does not share any data in common withthe first, second, third, fourth, fifth, and sixth datasets (and, asdiscussed below, eighth dataset). In one or more embodiments, theseventh dataset can be derived from the ML dataset of 305, or it can bederived from a completely new source.

At block 360, one or more embodiments involve further reducing thereduced input MLM using an eighth dataset. The reduction of the inputscan be in accordance with any of the techniques discussed herein,including one or more operations of block 340 (performing anotherbackpropagation and variance analysis), provided that, in one or moreembodiments, the eighth set does not share any data in common with thefirst, second, third, fourth, fifth, sixth and seventh datasets. In oneor more embodiments, the eighth dataset can be derived from the MLdataset of 305, or it can be derived from a completely new source.

In one or more embodiments, performing one or more operations of flow300B can further enhance an accuracy of a reduced input MLM, and withrespect to one or more operations of block 360, further reducing theinputs of an already reduced input MLM can make the model suitable for asmaller device or another scenario where computing and or memoryresources are limited, and without compromising the utility and/oraccuracy of the final model when used for its intended purpose.

FIG. 4 illustrates an example of a logic flow 400 that may berepresentative of some or all of the operations executed by one or moreembodiments described herein. For example, the logic flow 400 mayillustrate operations performed by a machine learning efficiency systemto reduce the inputs of one or more inputs of an MLM.

At block 405, one or more embodiments may include receiving a genericmachine learning dataset or ML dataset. The machine learning dataset canbe provided by one or more users over a network and to a device orcomponent as described in one or more embodiments herein, or it may bepre-stored in a system or component as described herein.

At block 410, one or more embodiments involve automaticallypartitioning, using any suitable component as described herein orotherwise consistent with the teachings as outlined herein, the MLdataset into one or more other datasets for fully developing an MLM,including a first dataset, a second dataset, a third dataset, and afourth dataset. In one or more embodiments, the first dataset can be fortraining the MLM, the second dataset can be for validating the MLM, thethird dataset can be for testing the MLM, and the fourth dataset can befor reducing one or more inputs of the MLM. In one or more embodiments,the first, second, third, and fourth datasets do not share any data incommon.

At block 415, one or more embodiments involve automatically training theMLM using the first dataset. In one or more embodiments, the first(training) dataset may contain the largest amount of data in relation tothe second dataset, the third dataset, and the fourth dataset, and oneor more components of any suitable system as described herein mayprovide a learning algorithm, the type of which is contingent on thetype of MLM ultimately preferable to a particular application oruse-case, and use that learning algorithm to train the MLM.

At block 420, one or more embodiments involve automatically validatingthe MLM using the second dataset. In one or more embodiments, the second(validation) dataset may be the second largest dataset with respect tothe first dataset, the second dataset, the third dataset, and the fourthdataset. In one or more embodiments, the trained MLM from block 315 maybe validated using the second dataset, e.g., one or more hyperparametersor any relevant external factor that can affect the MLM in real-timeusage will be tuned or optimized using the second dataset. In one ormore embodiments, factors that can be addressed in validation includebut are not limited to the optimal size of input data, optimalprediction outputs, and the scope and/or the classifiable differencebetween inputs.

At block 425, one or more embodiments involve automatically testing theMLM using the third dataset. In one or more embodiments, the third(testing) dataset may be the third largest dataset with respect to thefirst dataset, the second dataset, the third dataset, and the fourthdataset. In one or more embodiments, the validated MLM from block 320may be tested to determine any suitable error rate, where the error ratecan be adjusted by reperforming the operations of blocks 410-420. In oneor more embodiments, the MLM may be tested with respect to one or moreof an accuracy ability or feature, a precision ability or feature,and/or a recall ability or feature.

At block 430, one or more embodiments involve generating, using thefourth dataset, a normalized importance distribution of a plurality ofinputs of the MLM based on a variant backpropagation analysis of theplurality of inputs. The normalized importance distribution can be afirst graph that plots the progressive aggregated summation of aweighted value representing each input in relation to a single valuethat would include keeping every input of the MLM, e.g., 1, or thenormalized importance distribution can be a second graph that is basedon a reduced set of inputs derived from a first graph that plotted eachinput in relation to its weighted importance value. In one or moreembodiments, the second graph could be a progressive aggregatedsummation of a weighted value representing each input in relation to asingle value that would include keeping every input of the MLM, e.g., 1,

At block 435, one or more embodiments involve performing at least one ormore operation on the reduced input MLM based on another dataset andremoving at least one additional input from the reduced MLM based onthat at least one operation. The at least one more operation can be oneor more of an additional training operation, validating operation,testing operation, or input reduction operation, where each additionaloperation can use a distinct dataset from the first, second, third, andfourth dataset, and with respect to the dataset used in each otheradditional operation.

FIG. 5 illustrates an example of a machine learning efficiency system506. The machine learning efficiency system 506 includes one or moreprocessor(s) 532, memory 534, storage 536, one or more interface(s) 538,and one or more I/O device(s) 540. In embodiments, the machine learningefficiency system 506 may be a processing system that includes one ormore servers or computing devices that are interconnected via one ormore network links, e.g., wired, wireless, fiber, etc. In someinstances, the transaction services system may be a distributedcomputing system. Each of the servers may include one or moreprocessor(s) 532, which may include one or more processing cores toprocess information and data. Moreover, the one or more processors 532can include one or more processing devices, such as a microprocessormanufactured by Intel™, AMD™ or any of various processors. The disclosedembodiments are not limited to any type of processor(s).

Memory 534 can include one or more memory (volatile or non-volatile)devices configured to store instructions used by the one or moreprocessors 532 to perform one or more operations consistent with thedisclosed embodiments. For example, memory 534 can be configured withone or more software instructions, such as programs that can perform oneor more operations when executed by the one or more processors 532. Thedisclosed embodiments are not limited to separate programs or computersconfigured to perform dedicated tasks. For example, memory 534 caninclude a single program that performs the operations or could comprisemultiple programs. Memory 534 can also store data that can reflect anytype of information in any format that the system can use to performoperations consistent with the disclosed embodiments.

In embodiments, the transaction services system 504 may include one ormore storage devices 536. The storage devices 536 may include HDDs,flash memory devices, optical storage devices, floppy storage devices,etc. In some instances, the storage devices 556 may include cloud-basedstorage devices that may be accessed via a network interface. In someembodiments, the storage 536 may be configured to store one or moredatabases and/or as a distributed database system to store informationand data. Databases can include one or more memory devices that storeinformation and are accessed and/or managed through the transactionservices system 504. By way of example, databases can include Oracle™databases, Sybase™ databases, or other relational databases ornon-relational databases, such as Hadoop sequence files, HBase, orCassandra. The databases or other files can include, for example, dataand information related to the source and destination of a networkrequest, the data contained in the request, transaction information,etc. Systems and methods of disclosed embodiments, however, are notlimited to separate databases. In one aspect, transaction servicessystem 504 can include databases located remotely from other transactionservices system 504 devices. The databases can include computingcomponents (e.g., database management system, database server, etc.)configured to receive and process requests for data stored in memorydevices of databases and to provide data from databases.

FIG. 6 illustrates an embodiment of an exemplary computing architecture600 suitable for implementing various embodiments as previouslydescribed. In one embodiment, the computing architecture 600 may includeor be implemented as part of system 100.

As used in this application, the terms “system” and “component” areintended to refer to a computer-related entity, either hardware, acombination of hardware and software, software, or software inexecution, examples of which are provided by the exemplary computingarchitecture 600. For example, a component can be, but is not limited tobeing, a process running on a processor, a processor, a hard disk drive,multiple storage drives (of optical and/or magnetic storage medium), anobject, an executable, a thread of execution, a program, and/or acomputer. By way of illustration, both an application running on aserver and the server can be a component. One or more components canreside within a process and/or thread of execution, and a component canbe localized on one computer and/or distributed between two or morecomputers. Further, components may be communicatively coupled to eachother by various types of communications media to coordinate operations.The coordination may involve the uni-directional or bi-directionalexchange of information. For instance, the components may communicateinformation in the form of signals communicated over the communicationsmedia. The information can be implemented as signals allocated tovarious signal lines. In such allocations, each message is a signal.Further embodiments, however, may alternatively employ data messages.Such data messages may be sent across various connections. Exemplaryconnections include parallel interfaces, serial interfaces, and businterfaces.

The computing architecture 600 includes various common computingelements, such as one or more processors, multi-core processors,co-processors, memory units, chipsets, controllers, peripherals,interfaces, oscillators, timing devices, video cards, audio cards,multimedia input/output (I/O) components, power supplies, and so forth.The embodiments, however, are not limited to implementation by thecomputing architecture 600.

As shown in FIG. 6, the computing architecture 600 includes a processingunit 604, a system memory 606 and a system bus 608. The processing unit604 can be any of various commercially available processors.

The system bus 608 provides an interface for system componentsincluding, but not limited to, the system memory 606 to the processingunit 604. The system bus 608 can be any of several types of busstructure that may further interconnect to a memory bus (with or withouta memory controller), a peripheral bus, and a local bus using any of avariety of commercially available bus architectures. Interface adaptersmay connect to the system bus 608 via a slot architecture. Example slotarchitectures may include without limitation Accelerated Graphics Port(AGP), Card Bus, (Extended) Industry Standard Architecture ((E)ISA),Micro Channel Architecture (MCA), NuBus, Peripheral ComponentInterconnect (Extended) (PCI(X)), PCI Express, Personal Computer MemoryCard International Association (PCMCIA), and the like.

The computing architecture 600 may include or implement various articlesof manufacture. An article of manufacture may include acomputer-readable storage medium to store logic. Examples of acomputer-readable storage medium may include any tangible media capableof storing electronic data, including volatile memory or non-volatilememory, removable or non-removable memory, erasable or non-erasablememory, writeable or re-writeable memory, and so forth. Examples oflogic may include executable computer program instructions implementedusing any suitable type of code, such as source code, compiled code,interpreted code, executable code, static code, dynamic code,object-oriented code, visual code, and the like. Embodiments may also beat least partly implemented as instructions contained in or on anon-transitory computer-readable medium, which may be read and executedby one or more processors to enable performance of the operationsdescribed herein.

The system memory 606 may include various types of computer-readablestorage media in the form of one or more higher speed memory units, suchas read-only memory (ROM), random-access memory (RAM), dynamic RAM(DRAM), Double-Data-Rate DRAM (DDRAM), synchronous DRAM (SDRAM), staticRAM (SRAM), programmable ROM (PROM), erasable programmable ROM (EPROM),electrically erasable programmable ROM (EEPROM), flash memory, polymermemory such as ferroelectric polymer memory, ovonic memory, phase changeor ferroelectric memory, silicon-oxide-nitride-oxide-silicon (SONOS)memory, magnetic or optical cards, an array of devices such as RedundantArray of Independent Disks (RAID) drives, solid state memory devices(e.g., USB memory, solid state drives (SSD) and any other type ofstorage media suitable for storing information. In the illustratedembodiment shown in FIG. 6, the system memory 606 can includenon-volatile memory 610 and/or volatile memory 612. A basic input/outputsystem (BIOS) can be stored in the non-volatile memory 610.

The computer 602 may include various types of computer-readable storagemedia in the form of one or more lower speed memory units, including aninternal (or external) hard disk drive (HDD) 614, a magnetic floppy diskdrive (FDD) 616 to read from or write to a removable magnetic disk 618,and an optical disk drive 620 to read from or write to a removableoptical disk 622 (e.g., a CD-ROM or DVD). The HDD 614, FDD 616 andoptical disk drive 620 can be connected to the system bus 608 by a HDDinterface 624, an FDD interface 626 and an optical drive interface 628,respectively. The HDD interface 624 for external drive implementationscan include at least one or both of Universal Serial Bus (USB) and IEEE1394 interface technologies.

The drives and associated computer-readable media provide volatileand/or nonvolatile storage of data, data structures, computer-executableinstructions, and so forth. For example, a number of program modules canbe stored in the drives and memory units 610, 612, including anoperating system 630, one or more application programs 632, otherprogram modules 634, and program data 636. In one embodiment, the one ormore application programs 632, other program modules 634, and programdata 636 can include, for example, the various applications and/orcomponents of the system 700.

A user can enter commands and information into the computer 602 throughone or more wire/wireless input devices, for example, a keyboard 638 anda pointing device, such as a mouse 640. Other input devices may includemicrophones, infra-red (IR) remote controls, radio-frequency (RF) remotecontrols, game pads, stylus pens, card readers, dongles, finger printreaders, gloves, graphics tablets, joysticks, keyboards, retina readers,touch screens (e.g., capacitive, resistive, etc.), trackballs, trackpads, sensors, styluses, and the like. These and other input devices areoften connected to the processing unit 604 through an input deviceinterface 642 that is coupled to the system bus 608 but can be connectedby other interfaces such as a parallel port, IEEE 1394 serial port, agame port, a USB port, an IR interface, and so forth.

A monitor 644 or other type of display device is also connected to thesystem bus 608 via an interface, such as a video adaptor 646. Themonitor 644 may be internal or external to the computer 602. In additionto the monitor 644, a computer typically includes other peripheraloutput devices, such as speakers, printers, and so forth. The computer602 may operate in a networked environment using logical connections viawire and/or wireless communications to one or more remote computers,such as a remote computer 648. The remote computer 648 can be aworkstation, a server computer, a router, a personal computer, portablecomputer, microprocessor-based entertainment appliance, a peer device orother common network node, and typically includes many or all theelements described relative to the computer 602, although, for purposesof brevity, only a memory/storage device 650 is illustrated. The logicalconnections depicted include wire/wireless connectivity to a local areanetwork (LAN) 652 and/or larger networks, for example, a wide areanetwork (WAN) 654. Such LAN and WAN networking environments arecommonplace in offices and companies, and facilitate enterprise-widecomputer networks, such as intranets, all of which may connect to aglobal communications network, for example, the Internet.

When used in a LAN networking environment, the computer 602 is connectedto the LAN 652 through a wire and/or wireless communication networkinterface or adaptor 656. The adaptor 656 can facilitate wire and/orwireless communications to the LAN 652, which may also include awireless access point disposed thereon for communicating with thewireless functionality of the adaptor 656.

When used in a WAN networking environment, the computer 602 can includea modem 658, or is connected to a communications server on the WAN 654or has other means for establishing communications over the WAN 654,such as by way of the Internet. The modem 658, which can be internal orexternal and a wire and/or wireless device, connects to the system bus608 via the input device interface 642. In a networked environment,program modules depicted relative to the computer 602, or portionsthereof, can be stored in the remote memory/storage device 650. It willbe appreciated that the network connections shown are exemplary andother means of establishing a communications link between the computerscan be used.

The computer 602 is operable to communicate with wire and wirelessdevices or entities using the IEEE 602 family of standards, such aswireless devices operatively disposed in wireless communication (e.g.,IEEE 602.11 over-the-air modulation techniques). This includes at leastWi-Fi (or Wireless Fidelity), WiMax, and Bluetooth™ wirelesstechnologies, among others. Thus, the communication can be a predefinedstructure as with a conventional network or simply an ad hoccommunication between at least two devices. Wi-Fi networks use radiotechnologies called IEEE 602.118 (a, b, g, n, etc.) to provide secure,reliable, fast wireless connectivity. A Wi-Fi network can be used toconnect computers to each other, to the Internet, and to wire networks(which use IEEE 602.3-related media and functions).

The various elements of the devices as previously described withreference to FIGS. 1-5 may include various hardware elements, softwareelements, or a combination of both. Examples of hardware elements mayinclude devices, logic devices, components, processors, microprocessors,circuits, processors, circuit elements (e.g., transistors, resistors,capacitors, inductors, and so forth), integrated circuits, applicationspecific integrated circuits (ASIC), programmable logic devices (PLD),digital signal processors (DSP), field programmable gate array (FPGA),memory units, logic gates, registers, semiconductor device, chips,microchips, chip sets, and so forth. Examples of software elements mayinclude software components, programs, applications, computer programs,application programs, system programs, software development programs,machine programs, operating system software, middleware, firmware,software modules, routines, subroutines, functions, methods, procedures,software interfaces, application program interfaces (API), instructionsets, computing code, computer code, code segments, computer codesegments, words, values, symbols, or any combination thereof. However,determining whether an embodiment is implemented using hardware elementsand/or software elements may vary in accordance with any number offactors, such as desired computational rate, power levels, heattolerances, processing cycle budget, input data rates, output datarates, memory resources, data bus speeds and other design or performanceconstraints, as desired for a given implementation.

FIG. 7 is a block diagram depicting an exemplary communicationsarchitecture 700 suitable for implementing various embodiments aspreviously described. The communications architecture 700 includesvarious common communications elements, such as a transmitter, receiver,transceiver, radio, network interface, baseband processor, antenna,amplifiers, filters, power supplies, and so forth. The embodiments,however, are not limited to implementation by the communicationsarchitecture 700, which may be consistent with system 100.

As shown in FIG. 7, the communications architecture 700 includes one ormore clients 702 and servers 704. The servers 704 may implement theserver device 526. The clients 702 and the servers 704 are operativelyconnected to one or more respective client data stores 706 and serverdata stores 707 that can be employed to store information local to therespective clients 702 and servers 704, such as cookies and/orassociated contextual information.

The clients 702 and the servers 704 may communicate information betweeneach other using a communication framework 710. The communicationsframework 710 may implement any well-known communications techniques andprotocols. The communications framework 710 may be implemented as apacket-switched network (e.g., public networks such as the Internet,private networks such as an enterprise intranet, and so forth), acircuit-switched network (e.g., the public switched telephone network),or a combination of a packet-switched network and a circuit-switchednetwork (with suitable gateways and translators).

The communications framework 710 may implement various networkinterfaces arranged to accept, communicate, and connect to acommunications network. A network interface may be regarded as aspecialized form of an input/output (I/O) interface. Network interfacesmay employ connection protocols including without limitation directconnect, Ethernet (e.g., thick, thin, twisted pair 10/100/1000 Base T,and the like), token ring, wireless network interfaces, cellular networkinterfaces, IEEE 702.7a-x network interfaces, IEEE 702.16 networkinterfaces, IEEE 702.20 network interfaces, and the like. Further,multiple network interfaces may be used to engage with variouscommunications network types. For example, multiple network interfacesmay be employed to allow for the communication over broadcast,multicast, and unicast networks. Should processing requirements dictatea greater amount speed and capacity, distributed network controllerarchitectures may similarly be employed to pool, load balance, andotherwise increase the communicative bandwidth required by clients 702and the servers 704. A communications network may be any one and thecombination of wired and/or wireless networks including withoutlimitation a direct interconnection, a secured custom connection, aprivate network (e.g., an enterprise intranet), a public network (e.g.,the Internet), a Personal Area Network (PAN), a Local Area Network(LAN), a Metropolitan Area Network (MAN), an Operating Missions as Nodeson the Internet (OMNI), a Wide Area Network (WAN), a wireless network, acellular network, and other communications networks.

The components and features of the devices described above may beimplemented using any combination of discrete circuitry, applicationspecific integrated circuits (ASICs), logic gates and/or single chiparchitectures. Further, the features of the devices may be implementedusing microcontrollers, programmable logic arrays and/or microprocessorsor any combination of the foregoing where suitably appropriate. It isnoted that hardware, firmware and/or software elements may becollectively or individually referred to herein as “logic” or “circuit.”

It will be appreciated that the exemplary devices shown in the blockdiagrams described above may represent one functionally descriptiveexample of many potential implementations. Accordingly, division,omission or inclusion of block functions depicted in the accompanyingfigures does not infer that the hardware components, circuits, softwareand/or elements for implementing these functions would be necessarily bedivided, omitted, or included in embodiments.

At least one computer-readable storage medium may include instructionsthat, when executed, cause a system to perform any of thecomputer-implemented methods described herein.

Some embodiments may be described using the expression “one embodiment”or “an embodiment” along with their derivatives. These terms mean that aparticular feature, structure, or characteristic described in connectionwith the embodiment is included in at least one embodiment. Theappearances of the phrase “in one embodiment” in various places in thespecification are not necessarily all referring to the same embodiment.Moreover, unless otherwise noted the features described above arerecognized to be usable together in any combination. Thus, any featuresdiscussed separately may be employed in combination with each otherunless it is noted that the features are incompatible with each other.

With general reference to notations and nomenclature used herein, thedetailed descriptions herein may be presented in terms of programprocedures executed on a computer or network of computers. Theseprocedural descriptions and representations are used by those skilled inthe art to most effectively convey the substance of their work to othersskilled in the art.

A procedure is here, and generally, conceived to be a self-consistentsequence of operations leading to a desired result. These operations arethose requiring physical manipulations of physical quantities. Usually,though not necessarily, these quantities take the form of electrical,magnetic or optical signals capable of being stored, transferred,combined, compared, and otherwise manipulated. It proves convenient attimes, principally for reasons of common usage, to refer to thesesignals as bits, values, elements, symbols, characters, terms, numbers,or the like. It should be noted, however, that all of these and similarterms are to be associated with the appropriate physical quantities andare merely convenient labels applied to those quantities.

Further, the manipulations performed are often referred to in terms,such as adding or comparing, which are commonly associated with mentaloperations performed by a human operator. No such capability of a humanoperator is necessary, or desirable in most cases, in any of theoperations described herein, which form part of one or more embodiments.Rather, the operations are machine operations.

Some embodiments may be described using the expression “coupled” and“connected” along with their derivatives. These terms are notnecessarily intended as synonyms for each other. For example, someembodiments may be described using the terms “connected” and/or“coupled” to indicate that two or more elements are in direct physicalor electrical contact with each other. The term “coupled,” however, mayalso mean that two or more elements are not in direct contact with eachother, but yet still co-operate or interact with each other.

Various embodiments also relate to apparatus or systems for performingthese operations. This apparatus may be specially constructed for therequired purpose and may be selectively activated or reconfigured by acomputer program stored in the computer. The procedures presented hereinare not inherently related to a particular computer or other apparatus.The required structure for a variety of these machines will appear fromthe description given.

It is emphasized that the Abstract of the Disclosure is provided toallow a reader to quickly ascertain the nature of the technicaldisclosure. It is submitted with the understanding that it will not beused to interpret or limit the scope or meaning of the claims. Inaddition, in the foregoing Detailed Description, it can be seen thatvarious features are grouped together in a single embodiment for thepurpose of streamlining the disclosure. This method of disclosure is notto be interpreted as reflecting an intention that the claimedembodiments require more features than are expressly recited in eachclaim. Rather, as the following claims reflect, inventive subject matterlies in less than all features of a single disclosed embodiment. Thus,the following claims are hereby incorporated into the DetailedDescription, with each claim standing on its own as a separateembodiment. In the appended claims, the terms “including” and “in which”are used as the plain-English equivalents of the respective terms“comprising” and “wherein,” respectively. Moreover, the terms “first,”“second,” “third,” and so forth, are used merely as labels, and are notintended to impose numerical requirements on their objects. What hasbeen described above includes examples of the disclosed architecture. Itis, of course, not possible to describe every conceivable combination ofcomponents and/or methodologies, but one of ordinary skill in the artmay recognize that many further combinations and permutations arepossible. Accordingly, the novel architecture is intended to embrace allsuch alterations, modifications and variations that fall within thespirit and scope of the appended claims.

What is claimed is:
 1. A method, comprising: generating, by a computerprocessor, a reduced input machine learning model (MLM) based ontraining an original MLM, wherein generating the reduced input MLMcomprises: training the original MLM using a first dataset, to generatea plurality of weights for the original MLM; pruning the reduced inputMLM such that a first input of a plurality of inputs of the original MLMis not included as an input of the reduced input MLM; determining that asize of the reduced input MLM exceeds a memory threshold of a targetdevice; and based on the determination that the size of the reducedinput MLM is greater than the memory threshold of the target device,prune the reduced input MLM to remove a second input of the plurality ofinputs of the original MLM included in the reduced input MLM to generatea second reduced MLM; and processing, by the second reduced MLMexecuting on the processor, an applied dataset.
 2. The method of claim1, wherein pruning the reduced input MLM comprises: automaticallyranking each of the plurality of inputs of the original MLM.
 3. Themethod of claim 2, wherein the ranking is based on a multimodal gaussiandistribution that includes three or more peaks, wherein the multimodalgaussian distribution is generated based on a variant analysis during abackpropagation operation of each input of the original MLM, wherein thereduction of the plurality of inputs is based on one or more results ofthe variant analysis.
 4. The method of claim 3, wherein the variantanalysis of the backpropagation operation sums each weight along a pathof each input of the original MLM, wherein the first input is removedbased on the summed weights of each path.
 5. The method of claim 4,wherein the variant analysis further comprises normalizing each summedweight.
 6. The method of claim 1, further comprising: comparing the sizeof the reduced input MLM to the memory threshold to determine thereduced input MLM exceeds the memory threshold of the target device. 7.The method of claim 1, wherein the memory threshold comprises a memorycapacity of the target device.
 8. A non-transitory computer-readablestorage medium, the computer-readable storage medium includinginstructions that when executed by a processor, cause the processor to:generate a reduced input machine learning model (MLM) based on trainingan original MLM, wherein generating the reduced input MLM comprises:training the original MLM using a first dataset, to generate a pluralityof weights for the original MLM; pruning the reduced input MLM such thata first input of a plurality of inputs of the original MLM is notincluded as an input of the reduced input MLM; determining that a sizeof the reduced input MLM exceeds a memory threshold of a target device;and based on the determination that the size of the reduced input MLM isgreater than the memory threshold of the target device, pruning thereduced input MLM to remove a second input of the plurality of inputs ofthe original MLM included in the reduced input MLM to generate a secondreduced MLM; and process, by the second reduced MLM executing on theprocessor, an applied dataset.
 9. The computer-readable storage mediumof claim 8, wherein pruning the reduced input MLM comprises:automatically rank each of the plurality of inputs of the original MLM.10. The computer-readable storage medium of claim 9, wherein the rankingis based on a multimodal gaussian distribution that includes three ormore peaks, wherein the multimodal gaussian distribution is generatedbased on a variant analysis during a backpropagation operation of eachinput of the original MLM, wherein the reduction of the plurality ofinputs is based on one or more results of the variant analysis.
 11. Thecomputer-readable storage medium of claim 10, wherein the variantanalysis of the backpropagation operation sums each weight along a pathof each input of the original MLM, wherein the first input is removedbased on the summed weights of each path.
 12. The computer-readablestorage medium of claim 11, wherein the variant analysis furthercomprises normalize each summed weight.
 13. The computer-readablestorage medium of claim 8, wherein the instructions further cause theprocessor to: compare the size of the reduced input MLM to the memorythreshold to determine the reduced input MLM exceeds the memorythreshold of the target device.
 14. The computer-readable storage mediumof claim 8, wherein the memory threshold comprises a memory capacity ofthe target device.
 15. A computing apparatus comprising: a processor;and a memory storing instructions that, when executed by the processor,cause the processor to: generate a reduced input machine learning model(MLM) based on training an original MLM, wherein generating the reducedinput MLM comprises: training the original MLM using a first dataset, togenerate a plurality of weights for the original MLM; pruning thereduced input MLM such that a first input of a plurality of inputs ofthe original MLM is not included as an input of the reduced input MLM;determining that a size of the reduced input MLM exceeds a memorythreshold of a target device; and based on the determination that thesize of the reduced input MLM is greater than the memory threshold ofthe target device, pruning the reduced input MLM to remove a secondinput of the plurality of inputs of the original MLM included in thereduced input MLM to generate a second reduced MLM; and process, by thesecond reduced MLM executing on the processor, an applied dataset. 16.The computing apparatus of claim 15, wherein pruning the reduced inputMLM comprises: automatically rank each of the plurality of inputs of theoriginal MLM.
 17. The computing apparatus of claim 16, wherein theranking is based on a multimodal gaussian distribution that includesthree or more peaks, wherein the multimodal gaussian distribution isgenerated based on a variant analysis during a backpropagation operationof each input of the original MLM, wherein the reduction of theplurality of inputs is based on one or more results of the variantanalysis.
 18. The computing apparatus of claim 17, wherein the variantanalysis of the backpropagation operation sums each weight along a pathof each input of the original MLM, wherein the first input is removedbased on the summed weights of each path.
 19. The computing apparatus ofclaim 18, wherein the variant analysis further comprises normalize eachsummed weight.
 20. The computing apparatus of claim 15, wherein thememory threshold comprises a memory capacity of the target device,wherein the instructions further cause the processor to: compare thesize of the reduced input MLM to the memory threshold to determine thereduced input MLM exceeds the memory threshold of the target device.